Characteristics of the Stock Market

Additional Packages

Mainly functions from quantmod package will be used for tests and calculations. repr package is used for resizing plots for better visualization.


In [1]:
library("quantmod")
library("repr")
#Change plot size.
options(repr.plot.width=14, repr.plot.height=10)


Loading required package: xts
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Loading required package: TTR
Version 0.4-0 included new data defaults. See ?getSymbols.

Description of Data

The data used in this analysis is the closing prices of two indexes, BIST100 and BISTTUM, between 2001 and 2016. BIST 100 index includes 100 leading companies of Borsa Istanbul and BISTTUM index includes every publicly listed company. The selection of leading companies into BIST 100 is mainly based on trading volume and market capitalizations. Companies involved in BIST 100 are susceptible to changes and reviewed 4 times a year by Borsa Istanbul. The weights of companies are calculated accordingly to market capitalizations. The formula for calculation of the two indices are the same and is represented as $E_t$.

$$E_t=\frac{\sum_{i=1}^n (F_{it}/D_t) * N_{it} * H_{it} * K_{it}}{B_t}$$

$E_t$ = Value of the index at t. $n$ = number of companies in the index. $F_{it}$ = price of the ith company at t. $D_t$ = The exchange rate of the index at t. $N_{it}$ = total number of shares of the ith company at t. $H_{it}$ = free float rate of the ith company at t. $K_{it}$ = coefficient of ith company at t. $B_t$ = denominator of index at t.

Data is stored in a comma separated value (csv) file and transferred into a data frame with the read.csv function.


In [2]:
closingdata <- read.csv(file="BIST.csv", header=TRUE, sep=",")

Data frame is converted into an Extensible Time Series (xts) object for easier calculation and better compatibility with quantmod package. Date column is first extracted then deleted from the data frame to correctly transfer the data into xts object.


In [3]:
date <- strptime(closingdata$Date, format = "%d/%m/%Y")
closingdata <- closingdata[,-1]
xclosingdata <- xts(closingdata,date)
XU100 <- xclosingdata$XU100
XUTUM <- xclosingdata$XUTUM

Plotting XU100 and XUTUM between 2001/2008 and 2008/2016.


In [4]:
par(mfrow=c(2,1))
plot(XU100["2001/2008"],xlab="", ylab="",main="XU100 and XUTUM between 2001 and 2008")
par(new=T)
plot(as.vector(XUTUM["2001/2008"]),type="l",col="blue",xaxt="n", yaxt="n",ylab="",xlab="",main="")
axis(4,col.axis="blue",col="blue")
legend("topleft", c("XU100", "XUTUM"), lty=c(1,1), col=c("black", "blue"), bg="white",cex=0.7)

plot(XU100["2008/2016"],xlab="", ylab="",main="XU100 and XUTUM between 2008 and 2016")
par(new=T)
plot(as.vector(XUTUM["2008/2016"]),type="l",col="blue",xaxt="n", yaxt="n",ylab="",xlab="",main="")
axis(4,col.axis="blue",col="blue")
legend("topleft", c("XU100", "XUTUM"), lty=c(1,1), col=c("black", "blue"), bg="white",cex=0.7)


Statistics

Null Hypothesis

The null hypothesis ($H_0$) states that the compared samples are drawn from the same population with regard to the outcome variable. This means that any observed differences in the dependent variable (outcome) must be due to sampling error, the independent variable does not make a difference.

Research Hypothesis

The research hypothesis ($H_1$) is what we expect to happen, our prediction. It is also called the alternative hypothesis because it is an alternative to the null hypothesis. Technically, the claim of the research hypothesis is that with respect to the outcome variable, our samples are from different populations.

Example: If we predict that exercise results in better weight loss, we are predicting that after the treatment (exercise), the treated sample truly is different from the untreated one therefore, from a different population.

$H_0$ in this case would be exercise is unrelated to weight loss and $H_1$ would be exercise leads to weight loss.

P - Value

The p value determines whether or not we reject the null hypothesis. We use it to estimate whether or not we think the null hypothesis is true. The p value provides an estimate of how often we would get the obtained result by chance, if in fact the null hypothesis were true. When interpreting p value, if the p value is small, reject the null hypothesis and accept the alternative hypothesis that the samples are truly different with regard to the outcome. If the p value is large, we simply fail to reject the null hypothesis.

Hypothesis of our Analysis

In this analysis we will use the time intervals 2001/2008 and 2008/2016 because of the fact that there was a global financial crisis in 2008. Main goal of this section is to determine whether the global crisis caused market structure to change from a statistical standpoint.

In order to do so our the null hypothesis, $H_0$, is that there is no change in the market structure because of the crisis. The alternative hypothesis, $H_1$ , is that market structure changed with the effect of crises.

XU100 and XUTUM will be tested against each other in the same time frames in order to determine different effects on the top companies and the whole stock market.

Analysis

Log Returns

Instead of closing price series, log returns will be used for further statistical tests and calculations.

Advantage of returns compared to closing prices is normalization and it enables evaluation of analytic relationships among variables despite originating from price series of varied values by measuring all variables in a comparable metric.

$$ R_t = \frac{P_t - P_{t-1}}{P_{t-1}} $$

The Taylor expansion for $ \log{1+x} $ is $ x-\frac{x^2}{2}+\frac{x^3}{3} + O(x^4) $

When x is a small number, $ \log{1+x} \approx x $

Substituting $ R_t $ gives us $ \log{1+R_t} \approx R_t $

$$ =\log{1+\frac{P_t}{P_{t-1}-1}} \approx R_t $$$$ =\log{\frac{P_t}{P_{t-1}}} = =\log{P_t} - \log{P_{t-1}} \approx R_t $$
$R_t$ = Returns at t $P_t$ = Closing price at t

Log Return Distributions of XU100 and XUTUM

Summaries are very useful In order to get a first look at the data before analyzing.


In [5]:
summary(dailyReturn(log(XU100["2001/2008"])))
summary(dailyReturn(log(XU100["2008/2016"])))


     Index                     daily.returns       
 Min.   :2001-01-02 00:00:00   Min.   :-2.201e-02  
 1st Qu.:2003-01-02 06:00:00   1st Qu.:-1.259e-03  
 Median :2005-01-10 12:00:00   Median : 7.182e-05  
 Mean   :2005-01-03 10:35:09   Mean   : 5.717e-05  
 3rd Qu.:2007-01-07 06:00:00   3rd Qu.: 1.412e-03  
 Max.   :2008-12-31 00:00:00   Max.   : 1.365e-02  
     Index                     daily.returns       
 Min.   :2008-01-02 00:00:00   Min.   :-9.738e-03  
 1st Qu.:2010-03-31 18:00:00   1st Qu.:-7.676e-04  
 Median :2012-06-27 12:00:00   Median : 6.592e-05  
 Mean   :2012-06-30 17:11:22   Mean   : 1.543e-05  
 3rd Qu.:2014-10-01 06:00:00   3rd Qu.: 8.617e-04  
 Max.   :2016-12-30 00:00:00   Max.   : 1.168e-02  

In [6]:
summary(dailyReturn(log(XUTUM["2001/2008"])))
summary(dailyReturn(log(XUTUM["2008/2016"])))


     Index                     daily.returns       
 Min.   :2001-01-02 00:00:00   Min.   :-2.185e-02  
 1st Qu.:2003-01-02 06:00:00   1st Qu.:-1.193e-03  
 Median :2005-01-10 12:00:00   Median : 8.161e-05  
 Mean   :2005-01-03 10:35:09   Mean   : 5.967e-05  
 3rd Qu.:2007-01-07 06:00:00   3rd Qu.: 1.351e-03  
 Max.   :2008-12-31 00:00:00   Max.   : 1.307e-02  
     Index                     daily.returns       
 Min.   :2008-01-02 00:00:00   Min.   :-9.735e-03  
 1st Qu.:2010-03-31 18:00:00   1st Qu.:-7.251e-04  
 Median :2012-06-27 12:00:00   Median : 7.232e-05  
 Mean   :2012-06-30 17:11:22   Mean   : 1.763e-05  
 3rd Qu.:2014-10-01 06:00:00   3rd Qu.: 8.214e-04  
 Max.   :2016-12-30 00:00:00   Max.   : 1.123e-02  

Looking at the summaries, between 2001 and 2008 the mean log return for XU100 was $5.717*10^{-5}$ where median was $7.182*10^{-5}$, in the same period XUTUM mean and median was $5.967*10^{-5}$ and $7.182*10^{-5}$.

Between 2008 and 2016 the mean log return for XU100 was $1.543*10^{-5}$ where median was $6.952*10^{-5}$, in the same period mean and median of XUTUM was $1.763*10^{-5}$ and $7.232*10^{-5}$.

Means didn't change much in different time periods, however there is a substantial difference between means of XUTUM and XU100.


In [7]:
par(mfrow=c(2,2))
plot(density(dailyReturn(log(XU100["2001/2008"]))),type="h", main="XU100:2001/2008")
plot(density(dailyReturn(log(XU100["2008/2016"]))),type="h", main="XU100:2008/2016")
plot(density(dailyReturn(log(XUTUM["2001/2008"]))),type="h", main="XUTUM:2001/2008")
plot(density(dailyReturn(log(XUTUM["2008/2016"]))),type="h", main="XUTUM:2008/2016")


T - Test

T-test is used to determine the difference between means of two groups. The premise of this test is that the two groups are sampled from normal distributions with equal variances. The null hypothesis of t test is the means of samples are equal, and the alternative is that they are not.

In our case we will use t-test to determine if there is really a structural difference between different period of times.


In [8]:
t.test(dailyReturn(log(XUTUM["2008/2016"])),dailyReturn(log(XUTUM["2001/2008"])))


Warning message in tstat + c(-cint, cint):
“Recycling array of length 1 in array-vector arithmetic is deprecated.
  Use c() or as.vector() instead.
”Warning message in cint * stderr:
“Recycling array of length 1 in vector-array arithmetic is deprecated.
  Use c() or as.vector() instead.
”
	Welch Two Sample t-test

data:  dailyReturn(log(XUTUM["2008/2016"])) and dailyReturn(log(XUTUM["2001/2008"]))
t = -0.65751, df = 3219.4, p-value = 0.5109
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.673913e-04  8.331797e-05
sample estimates:
   mean of x    mean of y 
1.763454e-05 5.967121e-05 

In [9]:
t.test(dailyReturn(log(XU100["2008/2016"])),dailyReturn(log(XU100["2001/2008"])))


Warning message in tstat + c(-cint, cint):
“Recycling array of length 1 in array-vector arithmetic is deprecated.
  Use c() or as.vector() instead.
”Warning message in cint * stderr:
“Recycling array of length 1 in vector-array arithmetic is deprecated.
  Use c() or as.vector() instead.
”
	Welch Two Sample t-test

data:  dailyReturn(log(XU100["2008/2016"])) and dailyReturn(log(XU100["2001/2008"]))
t = -0.63164, df = 3231, p-value = 0.5277
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.713231e-04  8.783548e-05
sample estimates:
   mean of x    mean of y 
1.542995e-05 5.717374e-05 

In [10]:
t.test(dailyReturn(log(XUTUM["2001/2008"])),dailyReturn(log(XU100["2001/2008"])))


Warning message in tstat + c(-cint, cint):
“Recycling array of length 1 in array-vector arithmetic is deprecated.
  Use c() or as.vector() instead.
”Warning message in cint * stderr:
“Recycling array of length 1 in vector-array arithmetic is deprecated.
  Use c() or as.vector() instead.
”
	Welch Two Sample t-test

data:  dailyReturn(log(XUTUM["2001/2008"])) and dailyReturn(log(XU100["2001/2008"]))
t = 0.031311, df = 3998, p-value = 0.975
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.0001538836  0.0001588785
sample estimates:
   mean of x    mean of y 
5.967121e-05 5.717374e-05 

In [11]:
t.test(dailyReturn(log(XUTUM["2008/2016"])),dailyReturn(log(XU100["2008/2016"])))


Warning message in tstat + c(-cint, cint):
“Recycling array of length 1 in array-vector arithmetic is deprecated.
  Use c() or as.vector() instead.
”Warning message in cint * stderr:
“Recycling array of length 1 in vector-array arithmetic is deprecated.
  Use c() or as.vector() instead.
”
	Welch Two Sample t-test

data:  dailyReturn(log(XUTUM["2008/2016"])) and dailyReturn(log(XU100["2008/2016"]))
t = 0.048189, df = 4519.7, p-value = 0.9616
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -8.748520e-05  9.189437e-05
sample estimates:
   mean of x    mean of y 
1.763454e-05 1.542995e-05 

P values of t tests are really high to make meaningful conclusions about the data. We cannot reject the null hypothesis, there is no evidence whether the data is normally distributed or not.

Normalization

Generally it is assumed that log returns of equities and assets are normally distributed. In order to reach a more robust results in this analysis we require normalization of log returns to detect if their distributions are really normally distributed.

The following formula is used for our normalization process.

$$ X_{normalized} = \frac{X-mean(X)}{\sigma_X} $$

In [12]:
XU100_0108 <- (dailyReturn(log(XU100["2001/2008"])) - mean(dailyReturn(log(XU100["2001/2008"])))) / sd(dailyReturn(log(XU100["2001/2008"])))
XU100_0816 <- (dailyReturn(log(XU100["2008/2016"])) - mean(dailyReturn(log(XU100["2008/2016"])))) / sd(dailyReturn(log(XU100["2008/2016"])))
XUTUM_0108 <- (dailyReturn(log(XUTUM["2001/2008"])) - mean(dailyReturn(log(XUTUM["2001/2008"])))) / sd(dailyReturn(log(XUTUM["2001/2008"])))
XUTUM_0816 <- (dailyReturn(log(XUTUM["2008/2016"])) - mean(dailyReturn(log(XUTUM["2008/2016"])))) / sd(dailyReturn(log(XUTUM["2008/2016"])))

In [13]:
par(mfrow=c(2,2))
plot(density(XU100_0108),type="h", main="XU100:2001/2008")
plot(density(XU100_0816),type="h", main="XU100:2008/2016")
plot(density(XUTUM_0108),type="h", main="XUTUM:2001/2008")
plot(density(XUTUM_0816),type="h", main="XUTUM:2008/2016")


Means are fixed to 0 and standard deviations to 1 in order to fully compare and comprehend the distributions of the log returns of XUTUM and XU100.


In [14]:
summary(XU100_0108)
summary(XU100_0816)


     Index                     daily.returns      
 Min.   :2001-01-02 00:00:00   Min.   :-8.606904  
 1st Qu.:2003-01-02 06:00:00   1st Qu.:-0.513456  
 Median :2005-01-10 12:00:00   Median : 0.005715  
 Mean   :2005-01-03 10:35:09   Mean   : 0.000000  
 3rd Qu.:2007-01-07 06:00:00   3rd Qu.: 0.528680  
 Max.   :2008-12-31 00:00:00   Max.   : 5.301597  
     Index                     daily.returns     
 Min.   :2008-01-02 00:00:00   Min.   :-6.22119  
 1st Qu.:2010-03-31 18:00:00   1st Qu.:-0.49950  
 Median :2012-06-27 12:00:00   Median : 0.03221  
 Mean   :2012-06-30 17:11:22   Mean   : 0.00000  
 3rd Qu.:2014-10-01 06:00:00   3rd Qu.: 0.53984  
 Max.   :2016-12-30 00:00:00   Max.   : 7.44245  

In [15]:
summary(XUTUM_0108)
summary(XUTUM_0816)


     Index                     daily.returns      
 Min.   :2001-01-02 00:00:00   Min.   :-8.823072  
 1st Qu.:2003-01-02 06:00:00   1st Qu.:-0.504329  
 Median :2005-01-10 12:00:00   Median : 0.008836  
 Mean   :2005-01-03 10:35:09   Mean   : 0.000000  
 3rd Qu.:2007-01-07 06:00:00   3rd Qu.: 0.520201  
 Max.   :2008-12-31 00:00:00   Max.   : 5.238941  
     Index                     daily.returns     
 Min.   :2008-01-02 00:00:00   Min.   :-6.45775  
 1st Qu.:2010-03-31 18:00:00   1st Qu.:-0.49184  
 Median :2012-06-27 12:00:00   Median : 0.03621  
 Mean   :2012-06-30 17:11:22   Mean   : 0.00000  
 3rd Qu.:2014-10-01 06:00:00   3rd Qu.: 0.53222  
 Max.   :2016-12-30 00:00:00   Max.   : 7.42704  

Kolmogorov - Smirnov Test

Kolmogorov–Smirnov test is a test of the probability distributions that is used to compare a sample with a reference probability distribution (one-sample), or to compare two samples (two-sample K–S test) whether the two data samples come from the same distribution.

In this case, two sample KS test is used to determine the differences between normalized log return distributions of XU100 and XUTUM in specified periods.

D value of this test represents whether two data frames are from the same distribution. When D approaches to 0 we can safely conclude that the data frames are from the same distribution.

$$ D = \max_{1 \le i \le N} \left( F(Y_{i}) - \frac{i-1} {N}, \frac{i}{N} - F(Y_{i}) \right) $$

In [16]:
ks.test(XUTUM_0108,XUTUM_0816)


	Two-sample Kolmogorov-Smirnov test

data:  XUTUM_0108 and XUTUM_0816
D = 0.03455, p-value = 0.1582
alternative hypothesis: two-sided

In [17]:
ks.test(XU100_0108, XU100_0816)


	Two-sample Kolmogorov-Smirnov test

data:  XU100_0108 and XU100_0816
D = 0.030929, p-value = 0.2614
alternative hypothesis: two-sided

In [18]:
ks.test(XU100_0108, XUTUM_0108)


	Two-sample Kolmogorov-Smirnov test

data:  XU100_0108 and XUTUM_0108
D = 0.1029, p-value = 1.246e-09
alternative hypothesis: two-sided

In [19]:
ks.test(XU100_0816, XUTUM_0816)


	Two-sample Kolmogorov-Smirnov test

data:  XU100_0816 and XUTUM_0816
D = 0.074647, p-value = 6.642e-06
alternative hypothesis: two-sided

Obviously there is not much difference in the distributions of XU100 and XUTUM between the time frames 2001-2008 and 2008-2016. On the other hand, we can safely accept the alternative hypothesis that the distributions of XUTUM and XU100 are different in identical time frames.

Volatility of XU100 and XUTUM

Volatility refers to the amount of uncertainty or risk about the size of changes in a security's value. A higher volatility means that a security's value can potentially be spread out over a larger range of values. This means that the price of the security can change dramatically over a short time period in either direction. A lower volatility means that a security's value does not fluctuate dramatically, but changes in value at a steady pace over a period of time.

Monthly Volatility of XU100 and XUTUM

Formula used in volatility function from TTR package where $ R_i = \log{\frac{Ci}{C_{i-1}}} $ and $ \bar{R} = \frac{R_1 + R_2 + ... + R_{n-1}}{n-1} $.

$$ \sigma_{cl}=\sqrt[\leftroot{-2}\uproot{2}]{\frac{N}{n-2}*\sum_{i=1}^{n-1}(R_i - \bar{R})^2} $$


In [20]:
V_XU100_0108 <- volatility(XU100["2001/2008"],N=286,n=24,calc="close")
V_XU100_0816 <- volatility(XU100["2008/2016"],N=286,n=24,calc="close")
V_XUTUM_0108 <- volatility(XUTUM["2001/2008"],N=286,n=24,calc="close")
V_XUTUM_0816 <- volatility(XUTUM["2008/2016"],N=286,n=24,calc="close")

par(mfrow=c(2,2))
plot(density(na.omit(V_XU100_0108)),type="h", main="XU100:2001/2008")
plot(density(na.omit(V_XU100_0816)),type="h", main="XU100:2008/2016")
plot(density(na.omit(V_XUTUM_0108)),type="h", main="XUTUM:2001/2008")
plot(density(na.omit(V_XUTUM_0816)),type="h", main="XUTUM:2008/2016")



In [21]:
summary(V_XUTUM_0108)
summary(V_XUTUM_0816)


     Index                      V_XUTUM_0108   
 Min.   :2001-01-02 00:00:00   Min.   :0.1312  
 1st Qu.:2003-01-02 06:00:00   1st Qu.:0.2365  
 Median :2005-01-10 12:00:00   Median :0.3199  
 Mean   :2005-01-03 10:35:09   Mean   :0.3630  
 3rd Qu.:2007-01-07 06:00:00   3rd Qu.:0.4523  
 Max.   :2008-12-31 00:00:00   Max.   :1.1854  
                               NA's   :23      
     Index                      V_XUTUM_0816   
 Min.   :2008-01-02 00:00:00   Min.   :0.1036  
 1st Qu.:2010-03-31 18:00:00   1st Qu.:0.1807  
 Median :2012-06-27 12:00:00   Median :0.2227  
 Mean   :2012-06-30 17:11:22   Mean   :0.2530  
 3rd Qu.:2014-10-01 06:00:00   3rd Qu.:0.2876  
 Max.   :2016-12-30 00:00:00   Max.   :0.7726  
                               NA's   :23      

In [22]:
summary(V_XU100_0108)
summary(V_XU100_0816)


     Index                      V_XU100_0108   
 Min.   :2001-01-02 00:00:00   Min.   :0.1409  
 1st Qu.:2003-01-02 06:00:00   1st Qu.:0.2477  
 Median :2005-01-10 12:00:00   Median :0.3333  
 Mean   :2005-01-03 10:35:09   Mean   :0.3782  
 3rd Qu.:2007-01-07 06:00:00   3rd Qu.:0.4687  
 Max.   :2008-12-31 00:00:00   Max.   :1.2080  
                               NA's   :23      
     Index                      V_XU100_0816   
 Min.   :2008-01-02 00:00:00   Min.   :0.1095  
 1st Qu.:2010-03-31 18:00:00   1st Qu.:0.1891  
 Median :2012-06-27 12:00:00   Median :0.2344  
 Mean   :2012-06-30 17:11:22   Mean   :0.2634  
 3rd Qu.:2014-10-01 06:00:00   3rd Qu.:0.2959  
 Max.   :2016-12-30 00:00:00   Max.   :0.7942  
                               NA's   :23      

Looking at the summaries, between 2001 and 2008 the mean volatility of XU100 was $0.378$ where median was $0.333$, in the same period XUTUM mean and median was $0.363$ and $0.319$.

Between 2008 and 2016 the mean volatility of XU100 was $0.2643$ where median was $0.2344$, in the same period mean and median of XUTUM was $0.2530$ and $0.2227$.

Means did change throughout time in different time periods for XUTUM and XU100. Also there is a substantial difference between mean volalities of XU100 and XUTUM in the same time periods.

In order to test our hypothesis we will use t-test on volatility.


In [23]:
t.test(V_XU100_0108,V_XU100_0816)


Warning message in tstat + c(-cint, cint):
“Recycling array of length 1 in array-vector arithmetic is deprecated.
  Use c() or as.vector() instead.
”Warning message in cint * stderr:
“Recycling array of length 1 in vector-array arithmetic is deprecated.
  Use c() or as.vector() instead.
”
	Welch Two Sample t-test

data:  V_XU100_0108 and V_XU100_0816
t = 25.242, df = 3320.2, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.1059122 0.1237511
sample estimates:
mean of x mean of y 
0.3781990 0.2633673 

In [24]:
t.test(V_XUTUM_0108,V_XUTUM_0816)


Warning message in tstat + c(-cint, cint):
“Recycling array of length 1 in array-vector arithmetic is deprecated.
  Use c() or as.vector() instead.
”Warning message in cint * stderr:
“Recycling array of length 1 in vector-array arithmetic is deprecated.
  Use c() or as.vector() instead.
”
	Welch Two Sample t-test

data:  V_XUTUM_0108 and V_XUTUM_0816
t = 24.679, df = 3308.5, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.1012597 0.1187381
sample estimates:
mean of x mean of y 
0.3629968 0.2529979 

In [25]:
t.test(V_XU100_0108,V_XUTUM_0108)


Warning message in tstat + c(-cint, cint):
“Recycling array of length 1 in array-vector arithmetic is deprecated.
  Use c() or as.vector() instead.
”Warning message in cint * stderr:
“Recycling array of length 1 in vector-array arithmetic is deprecated.
  Use c() or as.vector() instead.
”
	Welch Two Sample t-test

data:  V_XU100_0108 and V_XUTUM_0108
t = 2.7947, df = 3954.6, p-value = 0.00522
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.00453747 0.02586698
sample estimates:
mean of x mean of y 
0.3781990 0.3629968 

In [26]:
t.test(V_XU100_0816,V_XUTUM_0816)


Warning message in tstat + c(-cint, cint):
“Recycling array of length 1 in array-vector arithmetic is deprecated.
  Use c() or as.vector() instead.
”Warning message in cint * stderr:
“Recycling array of length 1 in vector-array arithmetic is deprecated.
  Use c() or as.vector() instead.
”
	Welch Two Sample t-test

data:  V_XU100_0816 and V_XUTUM_0816
t = 3.1305, df = 4477.3, p-value = 0.001756
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.003875498 0.016863345
sample estimates:
mean of x mean of y 
0.2633673 0.2529979 

Looking at the p values, we can safely reject the null hypothesis in different time frames. Concluding that the different volatility distributions have different means. There is a difference in means but we don't know whether they are normally distributed or not.There is still not much difference between XUTUM and XU100 in the same time frames.

Conclusion

In our analysis we statistically proved that the volatility and log return distributions of XU100 and XUTUM changed after the crisis.

KS Test and normality is utilized to prove the differences in the log return distributions of XU100 and XUTUM between the time frames 2001/2008 and 2008/2016.

T test is used to analyzing the distributions of volatilities for different time frames. Results from t test obviously shows us that the risk profile of the markets have shifted after the crisis and the top 100 universe behaved differently from the whole stock market during the same period.

There is no suprise that log return and volatility distributions of XUTUM and XU100 differ during the same period as market is more liquid and action packed for bigger capitalized companies. In both periods mean volatility of XU100 is substantially higher than XUTUM.

All in all, 2008 crisis effected psychology of investors and shifted them to risk free mindset which lowered both average mean log returns and volatilities of XU100 and XUTUM. From a statistical standpoint, we can safely conclude that 2008 fundamentally effected the structure of Borsa Istanbul.

References

Borsa İstanbul Endeks ve Veri Bölümü. (2016, November). BIST PAY ENDEKSLERİ TEMEL KURALLARI. Retrieved February 15, 2018, from http://www.borsaistanbul.com/docs/default-source/endeksler/bist-pay-endeksleri-temel-kurallari.pdf?sfvrsn=12

Georgakopoulos, H. (2014). Quantitative trading with R: A practical guide to financial mathematics and statistical computing. Basingstoke: Palgrave Macmillan.

Shumway, R. H., & Stoffer, D. S. (2006). Time series analysis and its applications: With R examples. New York: Springer.

Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.

Chan, E. P. (2009). Quantitative trading: How to build your own algorithmic trading business. Hoboken, NJ: John Wiley & Sons.